41 research outputs found
Gray-box Adversarial Attack of Deep Reinforcement Learning-based Trading Agents
In recent years, deep reinforcement learning (Deep RL) has been successfully
implemented as a smart agent in many systems such as complex games,
self-driving cars, and chat-bots. One of the interesting use cases of Deep RL
is its application as an automated stock trading agent. In general, any
automated trading agent is prone to manipulations by adversaries in the trading
environment. Thus studying their robustness is vital for their success in
practice. However, typical mechanism to study RL robustness, which is based on
white-box gradient-based adversarial sample generation techniques (like FGSM),
is obsolete for this use case, since the models are protected behind secure
international exchange APIs, such as NASDAQ. In this research, we demonstrate
that a "gray-box" approach for attacking a Deep RL-based trading agent is
possible by trading in the same stock market, with no extra access to the
trading agent. In our proposed approach, an adversary agent uses a hybrid Deep
Neural Network as its policy consisting of Convolutional layers and
fully-connected layers. On average, over three simulated trading market
configurations, the adversary policy proposed in this research is able to
reduce the reward values by 214.17%, which results in reducing the potential
profits of the baseline by 139.4%, ensemble method by 93.7%, and an automated
trading software developed by our industrial partner by 85.5%, while consuming
significantly less budget than the victims (427.77%, 187.16%, and 66.97%,
respectively)
Log-based Anomaly Detection of Enterprise Software: An Empirical Study
Most enterprise applications use logging as a mechanism to diagnose
anomalies, which could help with reducing system downtime. Anomaly detection
using software execution logs has been explored in several prior studies, using
both classical and deep neural network-based machine learning models. In recent
years, the research has largely focused in using variations of sequence-based
deep neural networks (e.g., Long-Short Term Memory and Transformer-based
models) for log-based anomaly detection on open-source data. However, they have
not been applied in industrial datasets, as often. In addition, the studied
open-source datasets are typically very large in size with logging statements
that do not change much over time, which may not be the case with a dataset
from an industrial service that is relatively new. In this paper, we evaluate
several state-of-the-art anomaly detection models on an industrial dataset from
our research partner, which is much smaller and loosely structured than most
large scale open-source benchmark datasets. Results show that while all models
are capable of detecting anomalies, certain models are better suited for
less-structured datasets. We also see that model effectiveness changes when a
common data leak associated with a random train-test split in some prior work
is removed. A qualitative study of the defects' characteristics identified by
the developers on the industrial dataset further shows strengths and weaknesses
of the models in detecting different types of anomalies. Finally, we explore
the effect of limited training data by gradually increasing the training set
size, to evaluate if the model effectiveness does depend on the training set
size.Comment: 12 pages, 14 figures. Submitted to QRS 2023 - 23rd IEEE International
Conference on Software Quality, Reliability and Securit
Method-Level Bug Severity Prediction using Source Code Metrics and LLMs
In the past couple of decades, significant research efforts are devoted to
the prediction of software bugs. However, most existing work in this domain
treats all bugs the same, which is not the case in practice. It is important
for a defect prediction method to estimate the severity of the identified bugs
so that the higher-severity ones get immediate attention. In this study, we
investigate source code metrics, source code representation using large
language models (LLMs), and their combination in predicting bug severity labels
of two prominent datasets. We leverage several source metrics at method-level
granularity to train eight different machine-learning models. Our results
suggest that Decision Tree and Random Forest models outperform other models
regarding our several evaluation metrics. We then use the pre-trained CodeBERT
LLM to study the source code representations' effectiveness in predicting bug
severity. CodeBERT finetuning improves the bug severity prediction results
significantly in the range of 29%-140% for several evaluation metrics, compared
to the best classic prediction model on source code metric. Finally, we
integrate source code metrics into CodeBERT as an additional input, using our
two proposed architectures, which both enhance the CodeBERT model
effectiveness
An IR-based Approach Towards Automated Integration of Geo-spatial Datasets in Map-based Software Systems
Data is arguably the most valuable asset of the modern world. In this era,
the success of any data-intensive solution relies on the quality of data that
drives it. Among vast amount of data that are captured, managed, and analyzed
everyday, geospatial data are one of the most interesting class of data that
hold geographical information of real-world phenomena and can be visualized as
digital maps. Geo-spatial data is the source of many enterprise solutions that
provide local information and insights. In order to increase the quality of
such solutions, companies continuously aggregate geospatial datasets from
various sources. However, lack of a global standard model for geospatial
datasets makes the task of merging and integrating datasets difficult and
error-prone. Traditionally, domain experts manually validate the data
integration process by merging new data sources and/or new versions of previous
data against conflicts and other requirement violations. However, this approach
is not scalable and is hinder toward rapid release, when dealing with
frequently changing big datasets. Thus more automated approaches with limited
interaction with domain experts is required. As a first step to tackle this
problem, in this paper, we leverage Information Retrieval (IR) and geospatial
search techniques to propose a systematic and automated conflict identification
approach. To evaluate our approach, we conduct a case study in which we measure
the accuracy of our approach in several real-world scenarios and we interview
with software developers at Localintel Inc. (our industry partner) to get their
feedbacks.Comment: ESEC/FSE 2019 - Industry trac
Improving the Performance of DNN-based Software Services using Automated Layer Caching
Deep Neural Networks (DNNs) have become an essential component in many
application domains including web-based services. A variety of these services
require high throughput and (close to) real-time features, for instance, to
respond or react to users' requests or to process a stream of incoming data on
time. However, the trend in DNN design is toward larger models with many layers
and parameters to achieve more accurate results. Although these models are
often pre-trained, the computational complexity in such large models can still
be relatively significant, hindering low inference latency. Implementing a
caching mechanism is a typical systems engineering solution for speeding up a
service response time. However, traditional caching is often not suitable for
DNN-based services. In this paper, we propose an end-to-end automated solution
to improve the performance of DNN-based services in terms of their
computational complexity and inference latency. Our caching method adopts the
ideas of self-distillation of DNN models and early exits. The proposed solution
is an automated online layer caching mechanism that allows early exiting of a
large model during inference time if the cache model in one of the early exits
is confident enough for final prediction. One of the main contributions of this
paper is that we have implemented the idea as an online caching, meaning that
the cache models do not need access to training data and perform solely based
on the incoming data at run-time, making it suitable for applications using
pre-trained models. Our experiments results on two downstream tasks (face and
object classification) show that, on average, caching can reduce the
computational complexity of those services up to 58\% (in terms of FLOPs count)
and improve their inference latency up to 46\% with low to zero reduction in
accuracy
Automated Test Case Generation Using Code Models and Domain Adaptation
State-of-the-art automated test generation techniques, such as search-based
testing, are usually ignorant about what a developer would create as a test
case. Therefore, they typically create tests that are not human-readable and
may not necessarily detect all types of complex bugs developer-written tests
would do. In this study, we leverage Transformer-based code models to generate
unit tests that can complement search-based test generation. Specifically, we
use CodeT5, i.e., a state-of-the-art large code model, and fine-tune it on the
test generation downstream task. For our analysis, we use the Methods2test
dataset for fine-tuning CodeT5 and Defects4j for project-level domain
adaptation and evaluation. The main contribution of this study is proposing a
fully automated testing framework that leverages developer-written tests and
available code models to generate compilable, human-readable unit tests.
Results show that our approach can generate new test cases that cover lines
that were not covered by developer-written tests. Using domain adaptation, we
can also increase line coverage of the model-generated unit tests by 49.9% and
54% in terms of mean and median (compared to the model without domain
adaptation). We can also use our framework as a complementary solution
alongside common search-based methods to increase the overall coverage with
mean and median of 25.3% and 6.3%. It can also increase the mutation score of
search-based methods by killing extra mutants (up to 64 new mutants were killed
per project in our experiments).Comment: 10 pages + referenc